



International Journal of Trend in Scientific 
Research and Development (IJTSRD) 
International Open Access Journal 



ISSN No: 2456 - 6470 | www.ijtsrd.com | Volume - 2 | Issue - 3 


♦ 

♦ 


De-Identification Technology Involves Web Mining 


G. Sneka 

B.E (IV-CSE), Ponnaiyah 
Ramajayam College of 
Engineering and Technology, 
Thanjavur, Tamil Nadu, India 


H. Sharmila 

B.E (IV-CSE), Ponnaiyah 
Ramajayam College of 
Engineering and Technology, 
Thanjavur, Tamil Nadu, India 


R. Banumathi 

Assistant Professor, Ponnaiyah 
Ramajayam College of 
Engineering and Technology, 
Thanjavur, Tamil Nadu, India 


ABSTRACT 

Many data owners are required to release variety of 
data in real world application it has vital importance 
of discovery valuable information stay behind the 
data. However existing we focus on the de¬ 
identification policy which is the common privacy 
preserving approach. By using de-identification policy 
a continues balance between privacy protection and 
data utility can be achieved by choosing the 
appropriate. We propose one parallel algorithm “SKY 
FILTER POLICY GENERATOR” that can be filtered 
the optimized data” web ranking algorithm” provide 
the user preferred data. Each user has their own 
privacy username and password, through the sky filter 
(data analytic operator) user willing data given to the 
user. Each user is identified by de identification 
policy. 

Keywords: sky filter policy generator, web ranking 
algorithm, data analytic 

INTRODUCTION 

Many data owners are required to release the data in a 
variety of real world application, since it is of vital 
importance to discovery valuable information stay 
behind the data. However, existing re-identification 
attacks on the AOL and ADULTS datasets have 
shown that publish such data directly may cause 
tremendous threads to the individual privacy. Thus, it 
is urgent to resolve all kinds of re-identification risks 
by recommending effective de-identification policies 
to guarantee both privacy and utility of the data. De¬ 
identification policies is one of the models that can be 


used to achieve such requirements, however, the 
number of de-identification policies is exponentially 
large due to the broad domain of quasi-identifier 
attributes. To better control the tradeoff between data 
utility and data privacy, skyline computation can be 
used to select such policies, but it is yet challenging 
for efficient skyline processing over large number of 
policies. We propose one parallel algorithm called 
SKY-FILTER-MR, which is based on Map Reduce to 
overcome this challenge by computing skyline large- 
scale de-identification policies, is represented by bit¬ 
string. To provide sufficient background knowledge 
for our work, we discuss research efforts in privacy 
preserving data publication, risk and utility cost, 
skyline queries with a special focus on parallel 
processing, and discovery of deidentification policies. 
To use the De-Identification technology for optimized 
willing data given to the user through the sky filter 
policy generator. To better control the tradeoff 
between data utility and data privacy, skyline 
computation can be used to select such policies, but it 
is yet challenging for efficient skyline processing over 
large number of policies. We propose one parallel 
algorithm called” SKY-FILTER POLICY 
GENERATOR”. The optimal balance between data 
privacy and data utility has been paid more and more 
attention. To meet the multilevel needs of users for 
privacy and data utility, the discovery of de¬ 
identification policies is of paramount importance. To 
provide sufficient background knowledge for our 
work, we discuss research efforts in privacy 
preserving data publication, risk and utility cost, 
skyline queries with a special focus on parallel 
processing, and discovery of deidentification policies. 
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DESIGN 

Web Services 

Adapter module prepared in creating an example 
Stateless adapter module. The Web Services 
module will have the following properties: 
package: com.acompany.wespa Web Services 
module will use the interfaces defined for the 
example Stateless .my sc type Interface used by the 
client, acting on the Web Services module: My 
Service Capability Interface used by the Web 
Services module, acting on the client: My Service 
Capability Listener Method implemented by the 
Web Services module for application-initiated 
requests, defined in the My Service 
Capability interface: my Method Req enable 
Network Triggered Events disable Network 
Triggered Events Methods invoked by the Web 
Services module for application-initiated requests, 
defined in the My Service Capability 
Listener interface, and implemented by the client: 
my Method Res my Method End deactivate 
Interface used by the Web Services module, acting 
on the client: My Service Capability Network 
Triggered Event Listener Methods invoked by the 
Web Services module for network-triggered 
requests, defined in the My Service Capability 
Network Triggered Event Listener interface, and 
implemented by the client :my Deliver Network 
Triggered Event Method deactivate 

Access Centre 

Overview of database systems 

including database design, Entity Relationship data 
modeling, the relational model of data and SQL, as 
well as an overview of some database products. 

View 

The views module allows adminis-trators and site 
designers to create, manage, and display lists of 
content. Each list managed by the views module is 
known as a "view", and the output of a view is known 
as a "display". Displays are provided in either block 
or page form, and a single view. 

Multi end client 

The purpose of this module is to provide 
the user interface and view functions for the system. 
This is the software with which the user directly 
interacts. It communicates with the server to retrieve 
and modify persistent data when necessary. 

Policy Generator 

Policy Generator assists administrators in describing 
role-based policies with browsing resource 


information and user information, and it stores 
the descriptions in the form of XACML in Policy 
Repository. 



IMPLEMENTATION 

Here we proposed the system de- identification 
technology involves web mining the information are 
being connected with person identity which are give 
the user willing optimized data through the” SKY 
FILTER POLICY GENERATOR” through the 
algorithm called the “web ranking algorithm” can 
analyze the how much time the user can view the data 
and what kind of data they view. 



Figure: Graph for sky filter 

Policy generator can filter the user willing optimized 
data given to the user 

RELATED WORK 

APPLET: a privacy-preserving framework for 
location-aware recommender system. (Xindi MAI,2, 
Hui LI2, Jianfeng MAI,2, Qi JIANG2, Sheng GA03, 
Ning XI2* & Di LU1 OCT -2016) Location-aware 
recommender systems that use location-based ratings 
to produce recommendations have recently 
experienced a rapid development and draw significant 
attention from the research community. However, 
current work mainly focused on high-quality 
recommendations while underestimating privacy 
issues, which can lead to problems of privacy. Such 
problems are more prominent when service providers, 
who have limited computational and storage 
resources, leverage on cloud platforms to fit in with 
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the tremendous number of service requirements and 
users. In this paper, we propose a novel framework, 
namely APPLET, for protecting user privacy 
information, including locations and recommendation 
results, within a cloud environment. Through this 
framework, all historical ratings are stored and 
calculated in ciphertext, allowing us to securely 
compute the similarities of venues through Paillier 
encryption, and predict the recommendation results 
based on Paillier, commutative, and comparable 
encryption. We also theoretically prove that user 
information is private and will not be leaked during a 
recommendation. 

Efficient Discovery of De-identification Policies 
Through a Risk-Utility Frontier. ( BradelyMalin , 
Jiuyong Li, Raymonds Heatherly, Weiyi Xia, 
Xiaofeng Ding) Modem information technologies 
enable organizations to capture large quantities of 
person-specific data while providing routine services. 
Many organizations hope, or are legally required, to 
share such data for secondary purposes (e.g., 
validation of research findings) in a de-identified 
manner. In previous work, it was shown de¬ 
identification policy alternatives could be modeled on 
a lattice, which could be searched for policies that met 
a pre specified risk threshold (e.g., likelihood of re- 
identification). However, the search was limited in 
several ways. First, its definition of utility was 
syntactic - based on the level of the lattice - and not 
semantic - based on the actual changes induced in the 
resulting data. Second, the threshold may not be 
known in advance. The goal of this work is to build 
the optimal set of policies that trade-off between 
privacy risk (R) and utility (U), which we refer to as a 
R-U frontier. To model this problem, we introduce a 
semantic definition of utility, based on information 
theory, that is compatible with the lattice 
representation of policies. To solve the problem, we 
initially build a set of policies that define a frontier. 
We then use a probability guided heuristic to search 
the lattice for policies likely to update the frontier. To 
demonstrate the effectiveness of our approach, we 
perform an empirical analysis with the Adult dataset 
of the UCI Machine Learning Repository. We show 
that our approach can construct a frontier closer to 
optimal than competitive approaches by searching a 
smaller number of policies. In addition, we show that 
a frequently followed de-identification policy (i.e., the 
Safe Harbor standard of the HIP A A Privacy Rule) is 
suboptimal in comparison to the frontier discovered 
by our approach. 


A Simple and Practical Algorithm for Differentially 
Private Data Release (Moritz Hardt, Katrina 
Ligett,Frank Mcsherry) We present a new algorithm 
for differentially private data release, based on a 
simple combination of the Exponential Mechanism 
with the Multiplicative Weights update rule. Our 
MWEM algorithm achieves what are the best known 
and nearly optimal theoretical guarantees, while at the 
same time being simple to implement and 
experimentally more accurate on actual data sets than 
existing techniques. Fault tolerance, locality 
optimization and load balancing. Second, a large 
variety of problems are easily expressible as Map 
Reduce computations. For example, Map Reduce is 
used for the generation of data for Google's 
production web search service, for sorting, for data 
mining, for machine learning, and many other 
systems. Third, we have developed an implementation 
of Map Reduce that scales to large clusters of 
machines comprising thousands of machines. 
Sensitive statistical data on individuals are ubiquitous, 
and publishable analysis of such private data is an 
important objective. When releasing statistics or 
synthetic data based on sensitive data sets, one must 
balance the inherent tradeoff between the usefulness 
of the released information and the privacy of the 
affected individuals. Against this backdrop, 
differential privacy [4, 11, 5] has emerged as a 
compelling privacy definition that allows one to 
understand this tradeoff via formal, provable 
guarantees. In recent years, the theoretical literature 
on differential privacy has provided a large repertoire 
of techniques for achieving the definition in a variety 
of settings . However, data analysts have found. 

PrivBayes: Private Data Release via Bayesian 
Networks Cecilia M. Procopiuc, Divesh Srivastava, 
Jun Zhang,Graham Cormode, Xiaoki ) Privacy¬ 
preserving data publishing is an important problem 
that has been the focus of extensive study. The state- 
of-the-art goal for this problem is differential privacy, 
which offers a strong degree of privacy protection 
without making restrictive assumptions about the 
adversary. Existing techniques using differential 
privacy, however, cannot effectively handle the 
publication of high-dimensional data. In particular, 
when the input dataset contains a large number of 
attributes, existing methods require injecting a 
prohibitive amount of noise compared to the signal in 
the data, which renders the published data next to 
useless. To address the deficiency of the existing 
methods, this paper presents PRIVBAYES, a 
differentially private method for releasing high- 
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dimensional data. Given a dataset D, PRIVBAYES 
first constructs a Bayesian network N , which (i) 
provides a succinct model of the correlations among 
the attributes in D and (ii) allows us to approximate 
the distribution of data in D using a set P of low 
dimensional marginal’s of D. After that, 
PRIVBAYES injects noise into each marginal in P to 
ensure differential privacy, and then uses the noisy 
marginal’s and the Bayesian network to construct an 
approximation of the data distribution in D. Finally, 
PRIVBAYES samples tulles from the approximate 
distribution to construct a synthetic dataset, and then 
releases the synthetic data. Intuitively, PRIVBAYES 
circumvents the curse of dimensionality, as it injects 
noise into the low-dimensional marginal’s in P instead 
of the high dimensional dataset D. Private 
construction of Bayesian networks turns out to be 
significantly challenging, and we introduce a novel 
approach that uses a surrogate function for mutual 
information to build the model more accurately. We 
experimentally evaluate PRIVBAYES on real data, 
and demonstrate that it sig 

A Data-and Workload-Aware Algorithm for Range 
Queries Under Differential Privacy.(Chao Li , 
Gerome Miklau , Michael Hay, Yue Wang) We 
describe a new algorithm for answering a given set of 
range queries under -differential privacy which often 
achieves substantially lower error than competing 
methods. Our algorithm satisfies differential privacy 
by adding noise that is adapted to the input data and to 
the given query set. We first privately learn a 
partitioning of the domain into buckets that suit the 
input data well. Then we privately estimate counts for 
each bucket, doing so in a manner well-suited for the 
given query set. Since the performance of the 
algorithm depends on the input database, we evaluate 
it on a wide range of real datasets, showing that we 
can achieve the benefits of data-dependence on both 
“easy” and “hard” databases. Differential privacy [8, 
9] has received growing attention in the research 
community because it offers both an intuitively 
appealing and mathematically precise guarantee of 
privacy. In this paper we study batch (or non¬ 
interactive) query answering of range queries under - 
differential privacy. The batch of queries, which we 
call the workload, is given as input and the goal of 
research in this area is to devise differentially private 
mechanisms that offer the lowest error. 

CONCLUSION 

We study the recommendation on a great number of 
de-identification policies using Map Reduce. Firstly, 


we put forward an effective way of policy generation 
on the basis of newly proposed definition, which can 
decreases the time of generating policies and the size 
of alternative policy set dramatically. Secondly, we 
propose SKY-FILTER-POLICY GENERATOR, 
which is are analyze the user willing optimized data 
through the database we propose algorithm called web 
ranking algorithm used to analyze the user preferred 
data , to answer skyline de-identification policies 
efficiently. 

FUTURE ENHANCEMENT 

To further improve the performance, a novel 
approximate skyline computation scheme was 
proposed to prune unqualified policies using the 
approximately domination relationship with 
approximate skyline, the power of filtering in the 
policy space generation stage was greatly 
strengthened to effectively decrease the cost of 
skyline computation over alternative policies. 
Extensive experiments over both real life and 
synthetic datasets demonstrate that our proposed 
SKY-FILTER-POLICY GENERATOR algorithm 
substantially outperforms the baseline approach by 
faster in the optimal case, which indicates good 
scalability over large policy sets. 
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